统计研究 ›› 2022, Vol. 39 ›› Issue (1): 122-131.doi: 10.19343/j.cnki.11-1302/c 2022.01.009

• • 上一篇    下一篇

线上、线下调查数据的融合问题——以基于伪设计的校准为例

金勇进 刘晓宇   

  • 出版日期:2022-01-25 发布日期:2022-01-25

Integration of Online and Offline Survey Data: Taking Calibration Based on Pseudo-Design Inference as an Example

Jin Yongjin Liu Xiaoyu   

  • Online:2022-01-25 Published:2022-01-25

摘要: 在当前调查环境和互联网发展的背景下,线上、线下混合调查的方式得到广泛应用。如何将两方面数据相融合,减少信息浪费,合理利用数据资源,得到有效且精度高的估计结果,是大数据背景下调查数据推断面临的严峻挑战。本文针对线上样本是非概率样本、线下样本是概率样本的情况,提出了进行数据融合的基本思路:一是对非概率样本进行“概率性检验”,进而将两类数据结合在一起进行统计推断;二是利用概率样本中提供的信息,对非概率样本进行“伪随机化”。本文针对第二种思路,以基于倾向得分伪权数的校准估计为例,探讨了具体的解决方法及变量选择问题,并通过模拟进行验证。

关键词: 数据融合, 混合样本, 伪权数, 倾向得分, 校准法

Abstract: With the current survey environment and the development of the Internet, online and offline mixed survey methods are more and more widely used. How to merge the two kinds of data, reduce the waste of information, make rational use of data resources, and obtain effective and high-precision estimation results is a severe challenge for survey data inference in the context of big data. This article puts forward two basic ideas of data integration in the case that an online sample is a non-probability sample and an offline sample is a probability sample. The first is to perform “ probabilistic testing” on a non-probability sample, and then combine the two types of data for statistics inference; the second is to use the information provided in a probability sample to “pseudo-randomize” a non-probability sample. For the second idea, this paper takes the calibration estimation based on the pseudo-weight of the propensity score as an example, discusses specific solutions and variable selection problems, and verifies them through simulations.

Key words: Data Integration, Mixed Samples, Pseudo Weight, Propensity Score, Calibration